Formal Redundancy And Consistency Checking Rules For The Lexicai Database WordNet 1.5
نویسنده
چکیده
In a manually built-up semantic net in which not the concept definitions automatically determine the position of the concepts in the net, but rather the links coded by the lexicographers, the formal properties of the encoded attributes and relations provide necessary but not sufficient conditions to support maintenance of internal consistency and avoidance of redundancy. According to our experience the potential of this methodology has not yet been fully exploited due to lack of understanding of applicable formal rules, or due to inflexibility of available software tools. Based on a more comprehensive inquiry performed on the lexical database WordNet TM 1.5, this paper presents a selection of pertinent checking rules and the results of their application to WordNet 1.5. Transferable insights are: 1. Semantic relations which are closely related but differing in a checkable property, should be differentiated. 2. Inferable relations such as the transitive closure of a hierarchical relation or semantic relations induced by lexical ones need to be taken into account when checking real relations, i.e. directly stored relations. 3. A semantic net needs proper representation of lexical gaps. A disjunctive hypernym, implemented as a set of hypernyms, is considered harmful. I I n t r o d u c t i o n When building large-scale lexical/semantic resources, subsequent or better, simultaneous validation of content is essential. Basic validation includes formal redundancy and consistency checks. Taking WordNet 1.5 as an example, we illustrate the development and application of such rules. The computational environment for this enquiry is TerminologyFramework, an objectoriented generic tool developed to represent and consistently maintain concept-oriented dictionaries of different types, including WordNet [Fischer et al., 1996]. Into this system, the content of WordNet 1.5 was downloaded from the "dict/data.*"-files; some checks were performed during download as part of the operational semantics of our relation definitions, but most of the checks were simulated a-posteriori by database queries. The main aim of the enquiry is not to produce an error report on WordNet 1.5, but to develop a methodology of redundancy and consistency checks for re-use. Therefore, not only WordNet 1.5 has been checked, but our ideas have been developed and checked for validity and relevance using WordNet 1.5. Compared to the thesauri we had previously modeled and downloaded, WordNet 1.5 offers a richer set of semantic and lexical relations which give rise to new questions of redundancy or consistency (cf. [Fischer, 1993]). The more relations introduced into a manually built up net, the more dependencies are created which hold each other in check; once they are explicated, formulated as guidelines and implemented, they can be used to support internal consistency a necessary, but not sufficient condition for the correctness of a semantic net. In the following after having characterized in general the status of formal checks in semantic nets of the WordNet-type we present and comment a series of constellations which shall give exemplary insight into the topic. How to read the pictures in this paper All pictures in this paper are snapshots from the screen, and clippings from a window of TerminologyFramework's graphical browser [M6hr and Rostek, 1993]. The original, and hence uncorrected WordNet 1.5 data is shown as a graph where a node may represent a concept (i.e. synset, named by its first synset-element) or occasionally a term, i.e. synset element linked to its concept by the designation relation which is shown by its inverse from the concept side and is thus labeled term. Terms are represented in TerminologyFramework as unique objects having the synset element word/phrase as their possibly homographic name, and a system-generated and maintained homograph counter number which is stored separated from the name string as another term attribute. A term node label is distinguished from a concept label by a lowercase prefix indicating the part of speech (hence N for noun concepts / n for noun terms, V for verb concepts / v for verb terms). A concept's label body takes its content from its first synset element which is also transformed on download into a term with this label
منابع مشابه
A formal Method for the Synthesis of Update Transactions in Deductive Databases without Existential Rules
We propose a new method for generating consistency-preserving transaction programs for (view-)updates in deductive databases. The method augments the deductive database schema with a set of transition and internal events rules, which explicitly define the database dynamic behaviour in front of a database update. At transaction-design-time, a formal procedure can use these rules to automatically...
متن کاملDeduction and Consistency Checking in Object Oriented Schema Structures viewed as Logical Theories
Current Object Oriented (OO) Database schema structures allow isa relationships and multiple inheritance. We extend these structures with features that are not traditionally supported by OO schemas: disjointness of classes and class intersection inclusion into other classes as well as negations of these statements. Formally we represent schemas as sets of rst order monadic formulas. We provide ...
متن کاملThree types of redundancy in integrity checking: An optimal solution
Known methods for checking integrity constraints in deductive databases do not eliminate all aspects of redundancy in integrity checking. By making the redundancy aspects of integrity constraint checking explicit, independently from any chosen method, it is possible to develop a new method that is optimal with respect to the classiied redundancy aspects. We distinguish three types of redundancy...
متن کاملDeductive Database Systems and integrity constraint checking
Today, relational database systems have been extended in several ways. One way to extend the relational database model is by adding rules. These rules can make a lot of information implicit. In other words, a lot of intensional facts Sle stored in the database through rules. Consequently, when one updates such a database, a lot of implicit updates may appear by the presence of rules. When also ...
متن کاملAutomatic Construction of Persian ICT WordNet using Princeton WordNet
WordNet is a large lexical database of English language, in which, nouns, verbs, adjectives, and adverbs are grouped into sets of cognitive synonyms (synsets). Each synset expresses a distinct concept. Synsets are interlinked by both semantic and lexical relations. WordNet is essentially used for word sense disambiguation, information retrieval, and text translation. In this paper, we propose s...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1997